Integrating Correlation Clustering and Agglomerative Hierarchical Clustering for Holistic Schema Matching
نویسندگان
چکیده
Corresponding Author: Basel Alshaikhdeeb Faculty of Information Science and Technology, National University of Malaysia, Bangi, Malaysia Email: [email protected] Abstract: Holistic schema matching is the process of carrying off several number of schemas as an input and outputs the correspondences among them. Treating large number of schemas may consume longer time with poor quality. Therefore, several clustering approaches have been proposed in order to reduce the search space by partitioning the data into smaller portions which can facilitate the matching process. However, there is still a demand for improving the partitioning mechanism by avoiding the random initial solutions (centroids) re-sulted from the clustering process. Such random solutions have a significant impact on the matching results. This study aims to integrate correlation clustering and agglomerative hierarchical clustering toward improving the effectiveness of holistic schema matching. The proposed integrated method avoids the random initial so-lutions and the predefined number of centroids. Several preprocessing steps have been performed with using auxiliary information (domain dictionary). The experiments have been carried out on Airfare, Auto and Book datasets from UIUC Web Integration Repository. The proposed method has been compared with K-means and K-medoids clustering methods. As a results the proposed method has outperformed K-means and K-medoids by achieving 0.9, 0.93 and 0.9 of accuracy for Airfare, Auto and Book respectively.
منابع مشابه
Document Retrieval using Hierarchical Agglomerative Clustering with Multi-view point Similarity Measure Based on Correlation: Performance Analysis
Clustering is one of the most interesting and important tool for research in data mining and other disciplines. The aim of clustering is to find the relationship among the data objects, and classify them into meaningful subgroups. The effectiveness of clustering algorithms depends on the appropriateness of the similarity measure between the data in which the similarity can be computed. This pap...
متن کاملEfficient Clustering and Matching for Object Class Recognition
In this paper we address the problem of building object class representations based on local features and fast matching in a large database. We propose an efficient algorithm for hierarchical agglomerative clustering. We examine different agglomerative and partitional clustering strategies and compare the quality of obtained clusters. Our combination of partitional-agglomerative clustering give...
متن کاملIntegrate template matching and statistical modeling for speech recognition
We propose a novel approach of integrating template matching with statistical modeling to improve continuous speech recognition. We use multiple Gaussian Mixture Model (GMM) indices to represent each frame of speech templates, use hierarchical agglomerative clustering to generate template representatives, and use log likelihood ratio as the local distance measure for DTW template matching in la...
متن کاملCategorization via Agglomerative Correspondence Clustering
This paper presents computationally efficient object detection, matching and categorization via Agglomerative Correspondence Clustering (ACC). We implement ACC for feature correspondence and object-based image matching exploiting both photometric similarity and geometric consistency from local invariant features. Objectbased image matching is formulated here as an unsupervised multi-class clust...
متن کامل2 Review of Agglomerative Hierarchical Clustering Algorithms
Hierarchical methods are well known clustering technique that can be potentially very useful for various data mining tasks. A hierarchical clustering scheme produces a sequence of clusterings in which each clustering is nested into the next clustering in the sequence. Since hierarchical clustering is a greedy search algorithm based on a local search, the merging decision made early in the agglo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- JCS
دوره 11 شماره
صفحات -
تاریخ انتشار 2015